Quantitative Protein Dynamics from Dominant Folding Pathways 
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We develop a theoretical approach to the protein folding problem based on out-of-equilibrium stochastic dy- 
namics. Within this framework, the computational difficulties related to the existence of large time scale gaps 
in the protein folding problem are removed and simulating the entire reaction in atomistic details using existing 
computers becomes feasible. In addition, this formalism provides a natural framework to investigate the rela- 
tionships between thermodynamical and kinetic aspects of the folding. For example, it is possible to show that, 
in order to have a large probability to remain unchanged under Langevin diffusion, the native state has to be 
characterized by a small conformational entropy. We discuss how to determine the most probable folding path- 
way, to identify configurations representative of the transition state and to compute the most probable transition 
time. We perform an illustrative application of these ideas, studying the conformational evolution of alanine 
di-peptide, within an all-atom model based on the empiric GROMOS96 force field. 
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A critical part of the protein-folding problem is to under- 
stand its kinetics and the underlying physical processes. To 
this aim, several different theoretical methods have been re- 
cently developed, spanning from analytical approaches!]] 0, 
3] to detailed computer simulations @ IH @]. A major prob- 
lem in simulating the folding process using standard molecu- 
lar dynamics (MD) is the huge gap between the time scale of 
"elementary moves", of the order of 10-100 ps, and that of the 
entire folding process, which ranges from a few microseconds 
for fast-folders 131, up to several seconds or even minutes for 
more complex proteins. This peculiarity of the folding pro- 
cess makes the brute-force molecular dynamics approach too 
demanding, and a substantial part of the efforts in the field of 
protein folding simulation aims at bridging this gap. 

In a recent paper Jit] we have presented a novel theoreti- 
cal framework for investigating the folding dynamics, named 
hereafter Dominant Folding Pathways (DFP), which is based 
on a reformulation in terms of path integrals of the dynam- 
ics described by the Langevin equation. The DFP analysis 
allows to compute rigorously (i.e. without any assumptions 
other than the validity of the underlying Langevin equation) 
the most probable conformational pathway connecting an ar- 
bitrary initial conformation to an arbitrary final conformation. 
The major advantage of the method is the possibility of by- 
passing the computational difficulties associated with the ex- 
istence of different time scales in the problem, while retaining 
the ability to recover information on the time evolution of the 
system. As we shall see, the resulting computational simplifi- 
cation is dramatic and makes it feasible to study the formation 
pattern of conformational structures along the entire folding 
process using realistic all-atom force fields, on available com- 
puters. 

In this Letter we further develop our formalism and we 
present the first DFP simulation performed in full atomistic 
detail. We show how the DFP analysis gives access to impor- 



tant information about the dynamics of the folding process, 
such as the characterization and determination of the transi- 
tion state, and the most probable transition time. In addition, 
we show that in this formalism the native state is characterized 
by a single effective parameter and this leads to an interesting 
relationship between kinetic and thermodynamical quantities. 

Let us begin our discussion by briefly reviewing the key 
concepts of the DFP method, here presented for a simple one- 
dimensional system, without loss of generality. 

The DFP method can be applied to any system described by 
the over-damped Langevin equation 
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where U is the potential energy of the system, T|(f) is a Gaus- 
sian random force with zero average and correlation given by 
(r|(f)r|(f')) = 2£>8(f - *'). Note that in the original Langevin 
equation there is a mass term, mx. However, as shown in @], 
for proteins, this term can be neglected beyond time scales of 
the order of 10~ 13 s. 

The probability of finding the system in a conformation x/ 
at time tj starting from a conformation xi at ti, is a solution 
of the well-known Fokker-Planck Equation, and can be ex- 
pressed as a path-integral: 
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where 
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is called the effective action and 
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is called the effective potential. This quantity measures the 
tendency of a configuration to evolve under Langevin diffu- 
sion. In fact, the probability for the system to remain in the 
same configuration x under an infinitesimal time interval is 
given by 



P(x,dt\x,0)=e- v 'ffM dt 



(5) 



Hence, points of large effective potential are highly unstable 
under Langevin diffusion. 

There is an obvious sum rule: J dxfP(xf,tf\xi,ti) = 1, 
which can be written as 



1 =, 
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>From a saddle-point analysis of this sum-rule, it follows that 
given the initial condition (x;, ?;), the most probable paths con- 
tributing to © satisfy the Euler-Lagrange equations derived 
from the effective Lagrangian L e /f = x 2 /4D + V e ff(x) with 
proper boundary conditions 



d 2 - x = 2p dVeff 
dt 2 dx 

*f = -^U'(x f ) 
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The saddle-point equation ©, which comes from the variation 
of the exponent in (O with respect to the final point Xf, tells 
us which final points dominantly contribute to the sum rule, 
that is which conformations are most likely to be visited at the 
final time, tf. 

The numerical advantage in determining the DFP comes 
from the fact that L e ff is a Lagrangian describing an energy 
conserving dynamics, and therefore it is possible to use the 
Hamilton-Jacobi (HJ) description. With this change of frame- 
work, the total computational cost of the simulation now de- 
pends on the length of the path, rather than on the folding time. 
In the HJ framework, the most probable pathway is obtained 
by minimizing — not just extremizing — the functional 

S H j([x];xi,x f ) = £ f dly/l/D(E ef f + V ef f[x(l)]), (10) 

where dl = \J (dx) 2 is an infinitesimal conformational change 
along the path and the effective energy is given by 



tf — > °o, we know that P(xf,tf\xi,ti) converges to the Boltz- 
mann distribution 



P(x f ,tf\xi,ti) 
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where Z is the partition function of the system. If the reaction 
takes place at a temperature below the folding temperature, 
at large times the system will sample configurations Xf close 
to the global minimum-potential-energy configuration x n , for 
which U'(x n ) =0. 

We shall define the native state as the region of configura- 
tion space which is thermally accessible from the minimum- 
energy conformation x„, i.e. for which energy differences with 
respect to U(x n ) are of the order of IcbT. We can assume that, 
for all configurations Xf in the native state, the potential en- 
ergy can be described in the harmonic approximation: 

U(x f ) » U(x n ) + hj"{x„){x f -x n f (14) 
This equation together with equation ( fT2] i implies 

E eff = ^rlu"(x n ) = -V eff (x n ) (15) 

This equation is quite powerful, since it shows that the effec- 
tive energy does not depend on the specific conformation Xf in 
the native state, and is totally determined by the temperature 
of the heat-bath and by the curvature of the potential energy 
at the minimum-energy point x„. Stated differently, the native 
state belong to a surface in configuration space of constant 
effective energy E e ff. 

This parameter appears in the macroscopic quantities char- 
acterizing the thermodynamics of the native state. As an ex- 
ample, let us discuss the conformational entropy S am f, which 
measures the number of micro-states in the native state, i.e. 
the contribution to the partition function of all the config- 
urations for which U(x) — U(x n ) < k B T, where x„ is the 
minimum-energy configuration: 

Z N (T) = fdxe~^d(k B T-(U(x)-U(x n ))) (16) 



S C onf{T) = --^k B TlnZ N (T) 



(17) 



Expanding the potential energy quadratically around the 
minimum-energy configuration x n we have 



E eff = ^-y e ff(x)- 



(11) 



Since the effective energy is conserved along the DFP, using 
equation © we have 
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Note that, since the diffusion coefficient drops exponentially 
at low temperatures, D = D(T) ~ £>oexp[— E a /k B T], the ef- 
fective energy vanishes in this limit. In the long time limit, 



Z N ~ e "£r / 

J X, 

which gives 
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This equation expresses the intuitive fact that the conforma- 
tional entropy of the native state is small if the minimum of 
the potential energy is very narrow. From Eq.s (t20b and ( fT51 ), 
we can conclude that the effective potential V e ff(x) provides 
a measure of the conformational entropy of any (meta)-stable 
state. 

The effective potential also governs the kinetics of the fold- 
ing. As a result, in this formalism, it is possible to investigate 
the relationship between thermodynamical and kinetic aspects 
of the protein folding reaction. For example, we now show 
that the stability of the native state is related to its confor- 
mational entropy. To this end, let us consider the probability 
for the native state to remain unchanged during an elementary 
time interval dt , i.e. the probability that all points in the native 
state evolve into points which are still in the native state: 



P(native,aff|native,0) = 



dx 



U(x)~U(x n )<k B T 



U(y)-U{x n )<k B T 



dy P(y,dt\x,0) 



(21) 



This quantity, which generalizes the persistence probability 
(01, can be evaluated using Eq. (f2| and expanding the expo- 
nent in the Gaussian approximation. The result, which is quite 
involved and will not be presented here, shows that the persis- 
tence probability of the native state increases for large local 
curvatures of the potential energy near the native state, i.e. for 
V e ff(x) — * Hence, V e /f(x) controls both the stability and 
the conformational entropy of the native state. This implies 
that, in order to have a large probability to remain unchanged 
under Langevin diffusion, the native state has to be character- 
ized by a small conformational entropy. 

In the case of a protein with Hamiltonian H{r\ , . . . ,?n), 
denoting the coordinates of the minimum-energy conforma- 
tion by {// }, and assuming again that in the final stage of 
the folding, the protein samples the native state, we write the 
quadratic expansion around the minimum of H as 

H(n,...,7 N ) = H(ft\...,1$) (22) 
This equation implies the equivalent of equation (TT3T ) 
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(24) 



(25) 



where !H W is the Hessian matrix around the minimum-energy 
conformation. Obviously, such a quantity can be obtained 
either from a normal mode analysis around the native state, 
or equivalently by evaluating the average of the velocities 
v? = (drj/dt) 2 from several short MD simulations around the 
native state. 

To summarize the strategy to find the most probable reac- 
tion paths, we may proceed as follows: (i) Prepare several 



FIG. 1 : Dominant Folding Paths for the C7 a x -^Cl eq (red squares) 
and (Xl — > oc« (blue squares) transitions. In the background, the free 
energy profile for the \|/ and dihedrals is shown (in units of kl/mol). 
Black squares identify the minimum residence time conformations, 
and the white squares the transition states defined by comittement 
analysis. 



initial denatured conformations by running short MD simula- 
tions at high temperature, (ii) Prepare a representative set of 
the native state by making short time MD simulations from the 
minimum-energy configuration. These short time MD simu- 
lations also allow to compute the trace of the Hessian matrix, 
and thus the effective energy E e ff. (iii) Solve the Hamilton- 
Jacobi equations from the denatured conformations to the na- 
tive conformations, using the energy E e ft computed above. 

In order to make quantitative predictions on the folding 
process, we need to show that this framework can be suc- 
cessfully applied to all-atom models, using available com- 
puters. As a first application of this type, we study the ki- 
netics of alanine dipeptide, which is usually the benchmark 
system for the investigation of new simulation methods in 
this field. I 



lUJL whi 



1211 . The force-field employed is GRO- 



MOS96 111411 . while the electrostatics effects mediated by the 
solvent are accounted for by imposing a dielectric permittiv- 
ity z r = 80, leaving more sophisticated implicit descriptions 
of the solvent to forthcoming phenomenological applications. 

In Fig. ([]]) we present the results of the DFP analysis rel- 
ative to two specific transitions (C7 av — >C7 e? and 0Cl — ► (Xr), 
compared with the Free Energy landscape computed by direct 
integration. The values of the two \\t and (j) dihedrals along the 
paths obtained by minimising the effective action are plotted 
on top of the relative free energy map. These simulations were 
performed at temperature T = 300K, and assuming a diffusion 
coefficient D = 0.02A 2 ps~' for all atoms. To determine the 
DFP we have performed 500 cycles of simulated annealing of 
the discretized Hamilton- Jacobi functional J3], followed by a 
refinement stage where Conjugate Gradients were used. The 
effective energy E e ff was estimated running a few ps of MD 
simulation starting from the minimum-energy conformation. 
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We will now show how the DFP analysis can provide valu- 
able information about the dynamics of the transition, and 
about the determination of the transition state al ong the path. 
While other methods with similar purposes, e.g.|6j LLSj], only 
provide a meta-dynamics, the DFP analysis yields information 
on the real-time evolution of the system. Even though time is 
no longer an independent variable of the calculation in the HJ 
formulation, the total time required to perform the transition 
from a conformation Xj to a conformation Xf can be computed 
as 



t f -ti = 



Jxs 



dl- 



1 



y/AD{E eff + VeffW)]) 



(26) 



and the time spent in the neighborhood of each intermediate 
conformation (residence time along the path) is easily derived 
from the differential form of Eq. [26] The computed times 
for the C7ax^C7eq and ocz. — > OCr transitions are 12.0 and 
1 1 .4 ps, respectively. Notice, that this is the most probable 
transition time, and not the mean first passage, or Kramers 
time ifnll . 

An analysis of the residence time along the path shows that 
in each of the two DFP's, there are two points where the con- 
formation of alanine dipeptide has shortest residence time. 
These points, indicated in Fig. (Q]) with black symbols, are lo- 
cated in the proximity of the saddle-points of the free energy 
landscape, as one would expect. 

On the other hand, within the present formalism it is also 
possible to rigorously define the transition state along the path 
in terms of commitment analysis Il5lll6ll . Following Eq. (2), 
once the DFP has been determined, the conformation x ts is 
easily obtained by requiring that the probability in the sad- 
dle point approximation to diffuse back to the initial config- 
uration Xj, P(xj\x ts ) equates that of evolving toward the final 
native configuration x/, P{xf\x ts ). In the saddle-point approx- 
imation, this condition leads to the simple equation: 



U(x f )-U( Xi ) 
2k B T 



■ S H j{[x];x ts ,Xi) - S H j([x\;xts,Xf). (27) 



We want to point out that this definition of x ts neither relies 
on the use of any specific reaction coordinate, nor on the a 
priori knowledge of the free energy landscape, but is purely 
based on the properties of the diffusive dynamics followed by 
the system. Transition states computed using this prescription 
are shown in FigQ] as white points. These results provide a 
clean example of the fact that the definition of transition-state 
in terms of commitment analysis can be used to locate the 
configuration of highest free-energy barrier only in the case 
of two-state transitions. 

In conclusion, in the present work we have developed a new 
theoretical description of the protein folding reaction, based 



on Langevin dynamics. This approach allows for a huge re- 
duction of the computational cost needed for obtaining infor- 
mation on the full reaction pathway. Within this framework, 
all-atom simulations for a dipeptide can be performed in just a 
few minutes on a regular desktop, to be compared with times 
of the order of a week required by standard MD to exctract 
the same amount of information. Moreover, we have shown 
that this theoretical tool provides important new insight into 
the protein folding problem. In fact, it allows to define, char- 
acterize and study the native and transition states and to de- 
termine the transition time at different temperatures. We have 
also exhibited, within this framework, a clear connection be- 
tween the stability of the folded conformation and its small 
conformational entropy. 

Applications of this formalism to the study of the confor- 
mational transitions of small proteins with all-atom models 
and implicit solvent are in progress. 

Calculations were partly performed on the HPC facility 
"Wiglaf" at the Physics Department of the University of 
Trento. 
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